1 Combining Multiple Datasets in a Likelihood Analysis : Which Models are Best ?

نویسندگان

  • Dorothée Huchon
  • Ying Cao
  • Norihiro Okada
  • Masami Hasegawa
چکیده

Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular datasets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model); assuming that branch lengths are proportional among genes (proportional model); or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino-acid datasets, our results suggest that, depending on the dataset chosen, either the separate model or the proportional model represent the most appropriate method for branch length analysis. For all datasets examined, one gamma parameter to each gene represents the best model for among-site rate variation. Using these models, we analyzed alternative mammalian tree topologies and describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Detection of Multiple Sclerosis Lesions Using Texture-based Features and a Hybrid Classifier

Background: Multiple Sclerosis (MS) is the most frequent non-traumatic neurological disease capable of causing disability in young adults. Detection of MS lesions with magnetic resonance imaging (MRI) is the most common technique. However, manual interpretation of vast amounts of data is often tedious and error-prone. Furthermore, changes in lesions are often subtle and extremely unrepresentati...

متن کامل

Comparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds

Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...

متن کامل

Evaluation the Mean Performance and Stability of Rice Genotypes by Combining Features of AMMI and BLUP Techniques and Selection Based on Multiple Traits

Additive main effect and multiplicative interaction (AMMI) and best linear unbiased prediction (BLUP) are two methods for analyzing multi-environment trials (MET). In this study, seven selected rice lines were evaluated along with two check varieties based on randomized complete block design in Tonekabon, Amol and Sari (Iran) in three growing seasons of 2011-14. To quantify the genotypic stabil...

متن کامل

Combining multiple data sets in a likelihood analysis: which models are the best?

Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular data sets. Here, we compare several models for combining different genes for...

متن کامل

Efficiency evaluation of wheat farming: a network data envelopment analysis approach

Traditional data envelopment analysis (DEA) models deal with measurement of relative efficiency of decision making units (DMUs) in which multiple-inputs consumed to produce multiple-outputs. One of the drawbacks of these models is neglecting internal processes of each system, which may have intermediate products and/or independent inputs and/or outputs. In this paper some methods which are usab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002